Analysis and Visualization of Animal Rescue Incidents happened in London

Outline:

  1. Notebook instructions
  2. Introduction
  3. Data Cleaning
  4. Data Analysis and Visualization:
    4.1 Pattern of the animal rescue acts yearly and monthly
    4.2 Which kind of animal being rescued the most?
    4.3 What is the most frequent act of rescue?
    4.4 Where do animals usually encounter problem?
    4.5 Geographical Visualization - No token needed
    4.6 Geographical Visualization - Token needed
  5. Model Construction on incurred cost
    5.1 Linear Model
    5.2 Logistic Regression
  6. Conclusion

1. Notebook Instructions

Users could reproduce all the results as shown, while several packages should be installed first:

  1. wordcloud: (conda-forge. conda install -c conda-forge wordcloud);
  2. Plotly: (conda install -c plotly plotly or conda install -c plotly/label/test plotly)

Apart from these, if you want to reproduce the geographical visualization in Section 4.6, you need to request for a token from MapBox (https://www.mapbox.com/). Simply register on the website, then you could access the public token to play with maps and location data. If you do not want to request for a token, I also provide methods without it in Section 4.5.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from wordcloud import WordCloud
from PIL import Image
import plotly
import plotly.express as px
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, plot, iplot, init_notebook_mode
init_notebook_mode()
import json
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# to avoid the identification information 
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

List of all data resources used:

  1. 'Animal Rescue Incidents attended by LFB': https://data.london.gov.uk/dataset/animal-rescue-incidents-attended-by-lfb.
  2. 'Wards in London after 2018 elections': https://data.london.gov.uk/dataset/excel-mapping-template-for-london-boroughs-and-wards.
  3. 'Local Authority Districts (April 2019) Names and Codes in the United Kingdom': https://geoportal.statistics.gov.uk/datasets/c3ddcd23a15c4d7985d8b36f1344b1db_0/data.
  4. 'postcode-outcodes' dataset from https://www.freemaptools.com/download-uk-postcode-lat-lng.htm.

2. Introduction

Under the pandemic period, human nature is revealed more deeply. I was astonished when I saw the news last November that Denmark to cull 17 million minks in response to the Covid-19 outbreaks at more than 200 mink farms. I am not going to discuss the humanitarian issue of this matter here. While this news reminds me of the high school days when I participated in volunteer activities hosted by RSPCA. (Royal Society for the Prevention of Cruelty to Animals, the oldest and largest animal welfare organisation in the world.)

Therefore, I decided to browsing for a dataset relating to animal rescue, with the help of analysis and visualization tools to understand the current status of animal rescue. Eventuallly, I found a dataset called 'Animal Rescue Incidents Attended by LFB' on London Datastore (https://data.london.gov.uk/dataset/animal-rescue-incidents-attended-by-lfb). LFB stands for the London Fire Brigade, which is run by the London Fire Commissioner -- the fire and rescue authority for London. They attends a range of non-fire incidents (which they call 'special services). These 'special services' include assistance to animals that may be trapped or in distress. LFB keeps updating this dataset monthly since January 2009. There are 7224 records in total, including information like: Date of the incidence, Type of Incident, Rescued animal condition, Cost incurred, Location information (postcode, borough, ward) and so on. Thus I decide to use this dataset, from which I could have a look at questions I am interested in, such as: which kind of animals gets into trouble the most, what kind of difficulty they encountered, and where does the rescue act often take place, etc. Bearing these questions in mind, let us move on to data pre-processing firstly.

3. Data Cleaning

The condition of the dataset is as shown below:

In [2]:
file = "Animal Rescue incidents attended by LFB from Jan 2009 (1).csv"
Animal = pd.read_csv(file, encoding='latin-1')
print(Animal.shape)
print(Animal.isnull().sum())
(7224, 31)
IncidentNumber                   0
DateTimeOfCall                   0
CalYear                          0
FinYear                          0
TypeOfIncident                   0
PumpCount                       47
PumpHoursTotal                  48
HourlyNotionalCost(£)            0
IncidentNotionalCost(£)         48
FinalDescription                 5
AnimalGroupParent                0
OriginofCall                     0
PropertyType                     0
PropertyCategory                 0
SpecialServiceTypeCategory       0
SpecialServiceType               0
WardCode                         9
Ward                             9
BoroughCode                      8
Borough                          8
StnGroundName                    0
UPRN                          4692
Street                           0
USRN                          1156
PostcodeDistrict                 0
Easting_m                     3673
Northing_m                    3673
Easting_rounded                  0
Northing_rounded                 0
Latitude                      3673
Longitude                     3673
dtype: int64

We could observe from the output above that there are 14 variables out of 31 containing missing values, next I will analysis and tackle them one by one.

1. PumpCount:

PumpCount records the number of pump machine being used during each rescue act. We could observe from the output of 'describe' command below that up to 75% quantile of PumpCount remains the value 1. Thus I fill the missing values in with its medium value 1.

In [3]:
print(Animal['PumpCount'].describe())
PumpCountMed = Animal['PumpCount'].median()
Animal['PumpCount'].fillna(PumpCountMed, inplace = True)
count    7177.000000
mean        1.020761
std         0.154777
min         1.000000
25%         1.000000
50%         1.000000
75%         1.000000
max         4.000000
Name: PumpCount, dtype: float64

2. PumpHoursTotal

PumpHoursTotal as its literal meaning, recording the total hour took by pumping. The value of PumpHoursTotal remains 1.0 from 25% to 75% quantile of PumpHoursTotal. To avoid the effect of outliers, I fill the missing values in with its medium value 1.

In [4]:
print(Animal['PumpHoursTotal'].describe())
PumpHoursTotalMed = Animal['PumpHoursTotal'].median()
Animal['PumpHoursTotal'].fillna(PumpHoursTotalMed, inplace = True)
count    7176.000000
mean        1.184225
std         0.643645
min         0.000000
25%         1.000000
50%         1.000000
75%         1.000000
max        12.000000
Name: PumpHoursTotal, dtype: float64

3. IncidentNotionalCost(£)

IncidentNotionalCost is the estimated cost of each rescue act. As we could observe from below, the maximum value is more than 10 times of the cost at 75% quantile, thus I choose its median value to fill in for missing values.

In [5]:
print(Animal['IncidentNotionalCost(£)'].describe())
IncidentNotionalCostMed = Animal['IncidentNotionalCost(£)'].median()
Animal['IncidentNotionalCost(£)'].fillna(IncidentNotionalCostMed, inplace = True)
count    7176.000000
mean      354.105909
std       196.038916
min         0.000000
25%       260.000000
50%       298.000000
75%       339.000000
max      3912.000000
Name: IncidentNotionalCost(£), dtype: float64

4. FinalDescription

FinalDescription is brief descriptions regarding each animal incidence. I think it is better for me not to fill it in with my subjective assumptions.

5&6. WardCode & Ward

The ward is a local authority area, typically used for electoral purpose. During the data cleaning stage, I did not find any existing dataset resources perfectly match to this dataset. Considering the amount of missing value is small, thus I search information manually online.

General procedure: At the beginning, I type in the provided street name on https://www.streetcheck.co.uk to get the ward information. If it did not mention any information about ward, I will search it on google generally. Then, I will use the ward information to determine the Ward code according to the dataset 'Wards in London after 2018 elections': https://data.london.gov.uk/dataset/excel-mapping-template-for-london-boroughs-and-wards.

7&8. BoroughCode & Borough

Borough is the administrative division in various English-speaking countries. Again, I was not able to find any direct data resources. Thus, firstly, I search the corresponding borough on Wiki according to the postcode provided in the dataset. For most of the postcode, Wiki returns several boroughs, then I will need to search the street name on google map to further determine which borough district that record belongs to. After that, determine the borough code according to the dataset 'Local Authority Districts (April 2019) Names and Codes in the United Kingdom': https://geoportal.statistics.gov.uk/datasets/c3ddcd23a15c4d7985d8b36f1344b1db_0/data.

In [6]:
Animal[Animal['WardCode'].isna()] # output missing WardCode value index
IndexNo = [2304, 4137, 4164, 4245, 4708, 5896, 6338, 6767, 7071]
WardIn = ['Chase', 'Chigwell Village', 'High Street', 'Buckhurst Hill East', 'Chingford Green', 'Peninsula', 'Wexham & Fulmer', 'Ruxley', 'Nonsuch']
WardCodeIn = ['E05000195', 'E05004151', 'E05004996', 'E05004148', 'E05000593', 'E05002256', 'E05010579', 'E05007280', 'E05000561']
for i in range(len(IndexNo)):
    Animal.at[IndexNo[i], 'Ward'] = WardIn[i]
    Animal.at[IndexNo[i], 'WardCode'] = WardCodeIn[i]

IndexNo2 = [4137, 4164, 4245, 4708, 5896, 6338, 6767, 7071]
BoroughIn = ['Epping Forest', 'Epping Forest', 'Epping Forest', 'Epping Forest', 'Medway', 'Slough', 'Epsom and Ewell', 'Epsom and Ewell']
BoroughCodeIn = ['E07000072', 'E07000072', 'E07000072', 'E07000072', 'E06000035', 'E06000039', 'E07000208', 'E07000208']

for i in range(len(IndexNo2)):
    Animal.at[IndexNo2[i], 'Borough'] = BoroughIn[i]
    Animal.at[IndexNo2[i], 'BoroughCode'] = BoroughCodeIn[i]

9&10. UPRN & USRN

UPRN is short for Unique Property Reference Number, USRN is Unique Street Reference Number. As these two records are not very helpful information and not relevant to our analysis. I decide to drop them.

11&12. Easting_m & Northing_m

Because we have their rounded value 'Easting_rounded' and 'Northing_rounded' with no missing values, in other words, 'Easting_m' and 'Northing_m' could be regarded as reduplicate information, thus I decide to delete these two columns as well.

In [7]:
Animal = Animal.drop(['UPRN', 'USRN', 'Easting_m', 'Northing_m'], axis = 1)

13&14. Latitude & Longitude

In aim of filling in the Latitude and Longitude value, I use the 'postcode-outcodes' dataset from https://www.freemaptools.com/download-uk-postcode-lat-lng.htm.

As requested for copyright:\ Contains Ordnance Survey data © Crown copyright and database right 2020\ Contains Royal Mail data © Royal Mail copyright and database right 2020\ Source: Office for National Statistics licensed under the Open Government Licence v.3.0

In [8]:
LongnLa = pd.DataFrame(Animal, columns = ['PostcodeDistrict', 'Latitude', 'Longitude'])
LongnLaRef = LongnLa.dropna()
LongnLaRef.shape
LongnLaRef.drop_duplicates()
LongnLaRef.shape
postcode_lat_lng = pd.read_csv('postcode-outcodes.csv')
postcode_lat_lng.rename(columns = {'postcode': 'PostcodeDistrict'}, inplace = True)
postcode_lat_lng.head()
Out[8]:
id PostcodeDistrict latitude longitude
0 2 AB10 57.13514 -2.11731
1 3 AB11 57.13875 -2.09089
2 4 AB12 57.10100 -2.11060
3 5 AB13 57.10801 -2.23776
4 6 AB14 57.10076 -2.27073
In [9]:
Animal = pd.merge(Animal, postcode_lat_lng, on = 'PostcodeDistrict', how = 'left')
Animal = Animal.drop(['Latitude', 'Longitude', 'id'], axis = 1)
Animal = Animal.rename(columns = {'latitude': 'Latitude', 'longitude': 'Longitude'})
# Because there are several missing values in the Longitude-Latitude reference dataset, 
# thus I fill them in manually according to the search result on
# https://www.townscountiespostcodes.co.uk/postcodes/postcode-longitude-and-latitude.php
Animal.loc[Animal['Longitude'] == 0.0]
# only one record, thus replace value directly
# Longitude: -0.124893 Latitude: 51.537285
Animal['Longitude'] = Animal['Longitude'].replace([0.0], '-0.124893')
Animal.loc[Animal['Latitude'] == 0.0] 
Animal['Latitude'] = Animal['Latitude'].replace([0.0], '51.537285')
# check
Animal.loc[Animal['IncidentNumber'] == '147373-18112020']
Out[9]:
IncidentNumber DateTimeOfCall CalYear FinYear TypeOfIncident PumpCount PumpHoursTotal HourlyNotionalCost(£) IncidentNotionalCost(£) FinalDescription ... Ward BoroughCode Borough StnGroundName Street PostcodeDistrict Easting_rounded Northing_rounded Latitude Longitude
7143 147373-18112020 18/11/2020 20:54 2020 2020/21 Special Service 1.0 1.0 346 298.0 ANIMAL RESCUE - DISTRESSED FOX IN TRAPPED IN S... ... ST. PANCRAS AND SOMERS TOWN E09000007 CAMDEN Euston HANDYSIDE STREET N1C 530250 183650 51.537285 -0.124893

1 rows × 27 columns

In [10]:
# use the code below to do a final check
print(Animal.isnull().sum())
IncidentNumber                0
DateTimeOfCall                0
CalYear                       0
FinYear                       0
TypeOfIncident                0
PumpCount                     0
PumpHoursTotal                0
HourlyNotionalCost(£)         0
IncidentNotionalCost(£)       0
FinalDescription              5
AnimalGroupParent             0
OriginofCall                  0
PropertyType                  0
PropertyCategory              0
SpecialServiceTypeCategory    0
SpecialServiceType            0
WardCode                      0
Ward                          0
BoroughCode                   0
Borough                       0
StnGroundName                 0
Street                        0
PostcodeDistrict              0
Easting_rounded               0
Northing_rounded              0
Latitude                      0
Longitude                     0
dtype: int64

4. Data Analysis and Visualization

4.1 Pattern of the animal rescue acts yearly and monthly

After filling the missing values, we will first have a look at the pattern of rescue act both monthly and yearly.

In [10]:
# DateTime string to datetime mode for better use
Animal['DateTimeOfCall'] = pd.to_datetime(Animal['DateTimeOfCall'])
In [12]:
Year = Animal.groupby([Animal['DateTimeOfCall'].dt.year]).agg({'count'})['IncidentNumber']
Year.reset_index(level=0, inplace=True)
plt.figure(figsize=(10,6))
plt.hist(Animal['CalYear'], bins = 12, color='#6495ED', edgecolor='black')
plt.xticks(np.arange(min(Year['DateTimeOfCall']), max(Year['DateTimeOfCall'])+1, 1))
plt.xlabel('Year')
plt.ylabel('Year Counts')
plt.ylim(0, 1000)
Out[12]:
(0, 1000)

From the diagram above, we could observe that between 2009 and 2019, the total number of annual animal rescue act fluctuated around 580. In 2020, the number of act increase 25% by that of 2019. Then we will take a look at the pattern of monthly animal rescue act.

In [13]:
Month = Animal.groupby([Animal['DateTimeOfCall'].dt.year, Animal['DateTimeOfCall'].dt.month]).agg({'count'})['IncidentNumber']
Month.reset_index(level=1, inplace = True)
Month.columns = ['Month', 'Counts']
Month.reset_index(level=0, inplace = True)
Month.rename(columns = {'DateTimeOfCall': 'Year'}, inplace = True)
Month = Month.pivot(index='Month', columns = 'Year')

ax = Month.plot(figsize=(12,5))
ax.set_xticks([1,2,3,4,5,6,7,8,9,10,11,12])
ax.set_xticklabels(('Jan', 'Feb', 'Mar', 'Apr', 'May', 'June', 'July', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'))
ax.legend(bbox_to_anchor=(1,1), loc='upper left')
ax.set_ylabel('Animal Act Counts')
Out[13]:
Text(0, 0.5, 'Animal Act Counts')

We could see there is a parabola trend as shown in the diagram. The period between May and August could be seen as a fastigium of animal rescue. In addition to that, by looking at the deepest orange curve, which is the count of animal act in 2020. For half of the year, the amount acts taken is greater than previous 11 years.

4.2 Which kind of animal being rescued the most?

As we are provided with the type of animal being rescued, thus I produce a wordcloud to have a better visual sense.

In [14]:
# Found record cat and Cat, thus unify the name
Animal['AnimalGroupParent'] = Animal['AnimalGroupParent'].replace('cat', 'Cat')
count = Animal['AnimalGroupParent'].value_counts()
countdic = count.to_dict()
cat_mask = np.array(Image.open('cat.png'))
wordcloud = WordCloud(background_color="White", relative_scaling=0.3, contour_width=5, contour_color='black',
                      random_state=1, mask=cat_mask, width=1000,height=1000,margin=3, colormap='twilight').generate_from_frequencies(countdic)
plt.figure(figsize=(20,8))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

As shown above, the cat, bird, dog and fox are the four most rescued animals. Cat is the animal being rescued the most, thus I masked the wordcloud with a cat image. As the difference between the counts are relatively big, I adjust the relative scaling of word cloud a little bit low - 0.3, to make sure most of the category could be seen. For a more detailed analysis, we will have a look at the count numbers.

In [15]:
count
Out[15]:
Cat                                                        3514
Bird                                                       1447
Dog                                                        1156
Fox                                                         329
Horse                                                       190
Unknown - Domestic Animal Or Pet                            187
Deer                                                        122
Unknown - Wild Animal                                        81
Squirrel                                                     63
Unknown - Heavy Livestock Animal                             49
Hamster                                                      16
Rabbit                                                       14
Snake                                                        12
Cow                                                           8
Ferret                                                        8
Sheep                                                         5
Pigeon                                                        4
Unknown - Animal rescue from water - Farm animal              3
Lizard                                                        3
Hedgehog                                                      2
Lamb                                                          2
Fish                                                          2
Goat                                                          2
Budgie                                                        2
Bull                                                          1
Unknown - Animal rescue from below ground - Farm animal       1
Tortoise                                                      1
Name: AnimalGroupParent, dtype: int64

We could see from above that the rescue act counts of cat is more than two times as great as that of bird, and more than ten times as great as that of fox. Then I start wondering why these animals need the rescue. In other words, what kind of trouble are they in?

4.3 What is the most frequent act of rescue?

There is a variable call 'SpecialServiceType', from which we could have a look at the kind of trouble these animals was in.

In [16]:
typecount = Animal['SpecialServiceType'].value_counts()
typecountdic = typecount.to_dict()
wordcloud = WordCloud(background_color="White", relative_scaling=0.4, contour_width=6, contour_color='black',
                      random_state=1, width=2100,height=1000,margin=3, colormap='ocean').generate_from_frequencies(typecountdic)
plt.figure(figsize=(22,8))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

From the WordCloud above, we observe that 'Animal rescue from height - Domestic pet' is the most frequent type of rescue action. Followed by 'Assist trapped domestic animal' at the second position and 'Animal rescue from height - Bird' as the third.

4.4 Where do animals usually encounter problem?

After 'What', 'Which' questions, it comes to 'Where' problem, where does animals get in trouble the most?

In [17]:
propertycount = Animal['PropertyCategory'].value_counts()
print(propertycount)
print()
propertytypecount = Animal['PropertyType'].value_counts()
print(propertytypecount[:10])
Dwelling             3673
Outdoor              1934
Non Residential       750
Outdoor Structure     558
Road Vehicle          284
Other Residential      22
Boat                    3
Name: PropertyCategory, dtype: int64

House - single occupancy                              1855
Purpose Built Flats/Maisonettes - Up to 3 storeys      592
Purpose Built Flats/Maisonettes - 4 to 9 storeys       581
Tree scrub                                             320
Animal harm outdoors                                   280
Converted Flat/Maisonettes - 3 or more storeys         253
Car                                                    241
Domestic garden (vegetation not equipment)             220
Converted Flat/Maisonette - Up to 2 storeys            205
River/canal                                            177
Name: PropertyType, dtype: int64

To answer this query, I simply list out the counts of different property categories animal get into trouble with. Dwelling is the most common place, which is nearly twice as many as the count for Outdoor. To be more specific, I also list out the first 10 common PropertyType. We could see the majority of Dwelling being seperated into 'House - single occupancy', 'Purpose Built Flats/Maisonettes - Up to 3 storeys' and 'Purpose Built Flats/Maisonettes - 4 to 9 storeys'.

4.5 Geographical Visualization - No token needed

Let us take a look at where are these animal rescue act take place. For geographical visualization, I tried two methods, one is use OpenStreetMap tile, without needing a Mapbox Access Token; another method is the combination of Mapbox and plotly to generate an interactive map plot. Firstly, let us go for the OpenStreetMap tile to have a look at accidents in a specific year.

In [18]:
Year2009 = Animal[Animal['DateTimeOfCall'].dt.year==2009]
testAnimal = Animal.dropna()
fig = px.scatter_mapbox(Year2009, lat='Latitude', lon='Longitude',
                        hover_name='IncidentNumber', hover_data=['FinalDescription', 'IncidentNotionalCost(£)'],
                        color_discrete_sequence=['#6495ED'], zoom=3, height=300)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

I choose the data for year 2009 here to do the visualization, by moving your mouse onto the dot in the diagram, you could see the information about Latitude, Longitude, IncidentNumber, FinalDescription and IncidentNotionalCost. Analysis of location will be done in next section.

4.6 Geographical Visualization - Token needed

As mentioned at beginning, request a token from mapbox website is needed for this section.

In [19]:
with open("maptoken.json") as file:
    API = json.load(file)
mapbox_access_token = API["mapbox_access_token"]
In [20]:
yearrange = Animal['DateTimeOfCall'].dt.year.unique()
yearrange = list(yearrange)
yearrange1 = [str(i) for i in yearrange]

data = []
yearrange = Animal['DateTimeOfCall'].dt.year.unique()
yearrange = list(yearrange)
for year in yearrange1:
    year_data = dict(lat = Animal.loc[Animal['DateTimeOfCall'].dt.year == int(year), 'Latitude'],
                    lon = Animal.loc[Animal['DateTimeOfCall'].dt.year == int(year), 'Longitude'],
                    text = Animal.loc[Animal['DateTimeOfCall'].dt.year == int(year), 'FinalDescription'],
                    name = year,
                    marker = dict(size=8, opacity=0.5),
                    type = 'scattermapbox')
    data.append(year_data)
In [21]:
layout = dict(
    height = 800,
    margin = dict(t=0, b=0, l=0, r=0),
    font = dict(color='#FFFFFF', size=11),
    paper_bgcolor='#000000',
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=51.509865, lon=-0.118092
        ),
        pitch=0,
        zoom=5,
        # set the default map style as light,
        # you could choose from 'light', 'dark', 'street', 'outdoors' and 'satellite'
        style='light'
    )
)
In [22]:
updatemenus = list([
    # top-left drop-down menu: select year to visualize
    dict(buttons = list([
        dict(label = '2009-2020 Animal Rescue',
            method = 'update',
            args = [{'visible': [True, True, True, True, True, True, True, True, True, True, True, True]}]),
        dict(label = '2009 Animal Rescue',
            method = 'update',
            args = [{'visible': [True, False, False, False, False, False, False, False, False, False, False, False]}]),
        dict(label = '2010 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, True, False, False, False, False, False, False, False, False, False, False]}]),
        dict(label = '2011 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, True, False, False, False, False, False, False, False, False, False]}]),
        dict(label = '2012 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, True, False, False, False, False, False, False, False, False]}]),
        dict(label = '2013 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, True, False, False, False, False, False, False, False]}]),
        dict(label = '2014 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, True, False, False, False, False, False, False]}]),
        dict(label = '2015 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, False, True, False, False, False, False, False]}]),
        dict(label = '2016 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, False, False, True, False, False, False, False]}]),
        dict(label = '2017 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, False, False, False, True, False, False, False]}]),
        dict(label = '2018 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, False, False, False, False, True, False, False]}]),
        dict(label = '2019 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, False, False, False, False, False, True, False]}]),
        dict(label = '2020 Animal Rescue',
            method = 'update',
            args = [{'visible': [False, False, False, False, False, False, False, False, False, False, False, True]}]),
    ]),
        direction = 'down',
        x = 0.01,
        xanchor = 'left',
        y = 0.99,
        yanchor = 'bottom',
        bgcolor = '#000000',
        bordercolor = '#FFFFFF',
        font = dict(size=11)),
    
    
    # bottom-right drop-down menu: choose the style provided by mapbox
    dict(
        buttons=list([
            dict(
                args=['mapbox.style', 'light'],
                label='Light',
                method='relayout'
            ),
            dict(
                args=['mapbox.style', 'dark'],
                label='Dark',
                method='relayout'
            ),                    
            dict(
                args=['mapbox.style', 'outdoors'],
                label='Outdoors',
                method='relayout'
            ),
            dict(
                args=['mapbox.style', 'satellite-streets'],
                label='Satellite-Streets',
                method='relayout'
            )                    
        ]),
        direction = 'up',
        x = 0.8,
        xanchor = 'left',
        y = 0.01,
        yanchor = 'bottom',
        bgcolor = '#000000',
        bordercolor = '#FFFFFF',
        font = dict(size=11)
    ),    
])

layout['updatemenus'] = updatemenus
In [23]:
figure = dict(data=data, layout=layout)
plotly.offline.iplot(figure)

You could zoom in the interactive map above, and when you move the mouse to any point, it will tell you the longitude, latitude information and the description of that rescue act. At the right end of the description bar, year information is indicated. By using the manu bar at the left-top corner, you could choose to visualize one specific year's rescue location data or the total rescue location data. There is also a manu bar at the bottom-right corner, they are themes provided by mapbox, you could choose whichever you like. \ We could see from the dot distribution, the Rescue Action are concentrated in the Central London, City of London and Westminster. Other Rescue Location are scattered around.

5. Model Construction on incurred cost

As this dataset did not provide much numerical data, thus limited numerical data analysis could be done. I will have a simple analysis on the records of the incurred cost to take a look at the trend of the hourly notional cost.

In [22]:
test = Animal.copy()
test['Month'] = test['DateTimeOfCall'].dt.month
X = test[['CalYear', 'Month']]
y = test['HourlyNotionalCost(£)']
In [25]:
f, ax = plt.subplots(figsize=(20,10))
fig = sns.boxplot('CalYear', 'HourlyNotionalCost(£)', data=test)
In [26]:
trainset = X.copy()
trainset['HourlyNotionalCost(£)'] = y
sns.heatmap(trainset.corr(), annot=True, cmap='Paired')
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb39635d68>

By observing the box plot of the HourlyNotionalCost according to year in general, a linear relationship seems to be shown. Thus I will try linear model first.

5.1 Linear Model

In [27]:
reg = linear_model.LinearRegression()
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
reg.fit(x_train, y_train)
print('Coefficient of variable:', reg.coef_)
y_pred = reg.predict(x_test)
print('The Mean Square error of linear regression on test set is:', np.mean((y_pred-y_test)**2)) # mean-square error
reg.score(x_test, y_test)
print('Accuracy of linear regression on the test set: {:.2f}'.format(reg.score(x_test, y_test)))
Coefficient of variable: [9.11999709 0.29723307]
The Mean Square error of linear regression on test set is: 69.99396202814825
Accuracy of linear regression on the test set: 0.94

5.2 Logistic Regression

In [28]:
logreg = LogisticRegression(solver='lbfgs')
logreg.fit(x_train, y_train)
y_pred1 = logreg.predict(x_test)
print('The Mean Square error of logistic regression on test set is:', np.mean((y_pred1-y_test)**2))
print('Accuracy of logistic regression on the test set: {:.2f}'.format(logreg.score(x_test, y_test)))
The Mean Square error of logistic regression on test set is: 2590.3213087248323
Accuracy of logistic regression on the test set: 0.34

The Linear model fits well on our dataset with the accuracy 0.94, while logistic regression behaves badly, with accuracy rate 0.34. This result is as expected as the response is not binary-valued, thus logistic regression might not be a good candidate model. The mean square error index also revealed the same conclusion. Thus we will go for linear model for incurred cost prediction. To be more specific, the coefficient for year is 9.12, that for month is 0.30.

6. Conclusion

From the above analysis and visualization, we could conclude that:

  1. The trend pattern: the number of annual animal rescue acts fluctuate between 2009 and 2019, and increased 25% in 2020 compared to 2019. Besides, the period between May and August is the fastigium of animal rescue.
  2. Cat is the animal being rescued the most and animals are often get in trouble with height.
  3. The rescue act are concentrated in the Central London, City of London and Westminster. The left are scattered around.
  4. For hourly notional cost, we could use data of year and month, with the help of linear model, to achieve a 94% accuracy prediction.

For further study, in this dataset there are a interesting correlations relationship between variables. To be more specific, according to content in 'FinalDescription', we could conclude out information about 'AnimalGroupParent' and guess on 'PropertyType'. For example, from 'DOG WITH JAW TRAPPED IN MAGAZINE RACK,B15', we could extract the animal type -- dog; and magazine rack is the clue for 'PropertyType' -- Dwelling. Therefore, I am thinking of whether a function/programme could extract the animal type and guess the PropertyType from the description.

Personally, during the trend pattern analysis, the increase of animal rescue acts in 2020 is out of my expectation. 2020 is a tough year for everyone, while the animal rescue act rises. I want to thanks for London Fire Commissioner first. Apart from their own job, they also provide this special service and spend time and money on it. Secondly, I sincerely hope more and more people would be willing to know more about other creatures and give a helping hand when they are in need.